LIME analysis for the heart disease dataset.
Below, we can see LIME decompostions for an XGBoost model. The two samples were previously analysed in Homework 2.


It seems that the LIME values are quite stable for the XGBoost model.
Below, we can see two decompositions for the same sample, picked as the most different ones from a sample of 20.
For all the sampled explanations, the set of top 3 variables remained unchanged. The values don't change much, there is no cases of the same variable getting different signs. The IOU between the sets of top 10 predictors is 0.82, with just one variable being different -- thalachh and restecg_1, which are both in the lower part of the top 10.
We can conclude that, if the values are of an order of magnitude that makes them meaningful, sampling does not change much in the results.


For the logistic regression model, the differences between runs are much more visible.
Below, we can see two explanations for the same and the same sample. The second explanation was chosen as the one with the biggest absolute value of an explanation among the entire test set. As such, it is in a sense an outlier, and the differences between explanations represent an extreme case, not an average one. We can also note that the values are much smaller than in the case of the XGBoost model, which makes them less meaningful and reliable.
With that noted, we can see that the differences here are extreme:
sex_1 and restecg_1.

Below, we can see the LIME and SHAP analyses for a selected sample.
We can see that the rankings of variables differ. The sets of top variables are similar, with an IOU of 0.58, meaning that just 2 of the top values for SHAP are not included in the LIME explanation. There are no contradictions between the signs of the two explanations -- for variables that appear in both graphs, the signs of their contributions agree.


Below, we have the explanations for the other previously chosen sample. Again, altghough the ordering of the features differs, there are no variables that have different signs in the two explanations. The IOU is 0.73, with just one variable from SHAP not appearing in LIME (exng_1).
Taking into account the variability of LIME explanations, we can conclude that, although the values can vary considerably in unlucky cases, in an average case we can expect the two methods to yield reasonably similar results, with no contradictions. The main difference here is the scale of the values, and the interpretability of the two methods, with the LIME results not summing up to a meaningful value.


Even from the example above in the stability section, which has been sampled for the same observation for the two models, we can see that the LIME values vary considerably between the two models.
I have gathered LIME explanations for the enitre test set (61 samples) for two models: XGBoost and a logistic regression. Below, we can see a summary of two dataframes, collecting information about the maximal absolute result values for each sample -- first for the XGBoost model, and then for the logistic regression.
We can see that the order of magnitude of the values for the two models is different. LIME seems to not be able to find much of an explanation for the logistic regression model, with the values being near zero.
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import xgboost
import dalex as dx
import plotly.express as px
from scipy.stats import zscore
import warnings
warnings.filterwarnings("ignore")
DATA_PATH = "../heart.csv"
CATEGORICAL_COLUMNS = ['sex', 'cp', 'fbs', 'restecg', 'caa', 'exng', 'slp', 'thall']
TARGET_COLUMN = 'output'
df = pd.read_csv(DATA_PATH)
df.head()
df.describe()
for column in CATEGORICAL_COLUMNS:
df[column] = df[column].astype(str)
df = pd.get_dummies(df, drop_first=True)
df.columns
X = df.drop(columns = [TARGET_COLUMN])
y = df[TARGET_COLUMN]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
scaler = StandardScaler().fit(X_train)
X_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
logistic_regression = LogisticRegression().fit(X_scaled, y_train)
xgboost_model = xgboost.XGBClassifier(
n_estimators=200,
max_depth=4,
use_label_encoder=False,
eval_metric="logloss"
)
xgboost_model.fit(X_train, y_train)
The selected model here is XGBoost.
k = 20
x1, y1 = X_test.iloc[0:1], y_test.iloc[0:1]
x2, y2 = X_test.iloc[k:k+1], y_test.iloc[k:k+1]
for x, y in zip((x1, x2), (y1, y2)):
print(y)
print(xgboost_model.predict_proba(x))
print(xgboost_model.predict(x))
The selected package is dalex.
explainer = dx.Explainer(xgboost_model, X_train, y_train)
for sample in (x1, x2):
explanation = explainer.predict_surrogate(sample)
explanation.plot()
num_samples = 5
X_sample = [X_test.iloc[i:i+1] for i in range(num_samples)]
y_sample = [y_test.iloc[i:i+1] for i in range(num_samples)]
for sample in X_sample:
explanation = explainer.predict_surrogate(sample)
explanation.plot()
pf_xgboost_classifier_default = lambda m, d: m.predict_proba(d)[:, 1]
shap_explainer = dx.Explainer(xgboost_model, X_train, predict_function=pf_xgboost_classifier_default, label="XGBoost")
sample = x1
explainer.predict_surrogate(sample).plot()
shap_explainer.predict_parts(sample, type="shap").plot()
sample = x2
explainer.predict_surrogate(sample).plot()
shap_explainer.predict_parts(sample, type="shap").plot()
logistic_explainer = dx.Explainer(logistic_regression, X_train, y_train)
for sample in (x1, x2):
explanation = logistic_explainer.predict_surrogate(sample)
explanation.plot()
num_samples = len(X_test)
X_sample = [X_test.iloc[i:i+1] for i in range(num_samples)]
y_sample = [y_test.iloc[i:i+1] for i in range(num_samples)]
xgboost_results = []
logistic_results = []
xgboost_expls = {}
logistic_expls = {}
for sample in X_sample:
idx = sample.index[0]
expl = explainer.predict_surrogate(sample)
res = expl.result
res["sample_id"] = idx
xgboost_results.append(res)
xgboost_expls[idx] = expl
expl = logistic_explainer.predict_surrogate(sample)
res = expl.result
res["sample_id"] = idx
logistic_results.append(res)
logistic_expls[idx] = expl
xgboost_results = pd.concat(xgboost_results)
logistic_results = pd.concat(logistic_results)
xgboost_results["abs_effect"] = np.abs(xgboost_results.effect)
logistic_results["abs_effect"] = np.abs(logistic_results.effect)
fig = px.histogram(x=xgboost_results.effect)
fig.show()
fig = px.histogram(x=xgboost_results.groupby("sample_id").abs_effect.max())
fig.show()
fig = px.histogram(x=logistic_results.groupby("sample_id").abs_effect.max())
fig.show()
fig = px.histogram(data_frame=logistic_results, x="effect")
fig.show()
xgboost_results.describe()
logistic_results.describe()
xgboost_results.groupby("sample_id").abs_effect.max().describe()
logistic_results.groupby("sample_id").abs_effect.max().describe()
df = logistic_results.groupby("sample_id").max()
df[df.abs_effect == df.abs_effect.max()]
logistic_expls[160].plot()
x_max = X_test[X_test.index == 160]
res = logistic_explainer.predict_surrogate(x_max)
res.plot()
res = explainer.predict_surrogate(x_max)
res.plot()
res = explainer.predict_surrogate(x_max)
res.plot()
x_max
res.result